{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Part1_mnist_solutions.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python2",
      "display_name": "Python 2"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "metadata": {
        "id": "Xmf_JRJa_N8C",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "<table align=\"center\">\n",
        "  <td align=\"center\"><a target=\"_blank\" href=\"http://introtodeeplearning.com\">\n",
        "        <img src=\"http://introtodeeplearning.com/images/colab/mit.png\" style=\"padding-bottom:5px;\" />\n",
        "      Visit MIT Deep Learning</a></td>\n",
        "  <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning_labs/blob/master/lab2/Part1_mnist.ipynb\">\n",
        "        <img src=\"http://introtodeeplearning.com/images/colab/colab.png?v2.0\"  style=\"padding-bottom:5px;\" />Run in Google Colab</a></td>\n",
        "  <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning_labs/blob/master/lab2/Part1_mnist.ipynb\">\n",
        "        <img src=\"http://introtodeeplearning.com/images/colab/github.png\"  height=\"70px\" style=\"padding-bottom:5px;\"  />View Source on GitHub</a></td>\n",
        "</table>\n",
        "\n",
        "\n",
        "# Laboratory 2: Computer Vision\n",
        "\n",
        "# Part 1: MNIST Digit Classification\n",
        "\n",
        "In the first portion of this lab, we will build and train a convolutional neural network (CNN) for classification of handwritten digits from the famous [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. The MNIST dataset consists of 60,000 training images and 10,000 test images. Our classes are the digits 0-9.\n",
        "\n",
        "First we'll import TensorFlow, enable Eager execution, and also import some dependencies."
      ]
    },
    {
      "metadata": {
        "id": "RsGqx_ai_N8F",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import tensorflow as tf\n",
        "tf.enable_eager_execution()\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import random\n",
        "from progressbar import progressbar\n",
        "\n",
        "# Download the class repository\n",
        "! git clone https://github.com/aamini/introtodeeplearning_labs.git  > /dev/null 2>&1\n",
        "% cd introtodeeplearning_labs \n",
        "! git pull\n",
        "% cd .. \n",
        "\n",
        "# Import the necessary class-specific utility files for this lab\n",
        "import introtodeeplearning_labs as util"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "HKjrdUtX_N8J",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## 1.1 MNIST dataset \n",
        "\n",
        "Let's download and load the dataset and display a few random samples from it:"
      ]
    },
    {
      "metadata": {
        "id": "p2dQsHI3_N8K",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "mnist = tf.keras.datasets.mnist\n",
        "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n",
        "train_images = np.expand_dims(train_images, axis=-1)/255.\n",
        "train_labels = np.int64(train_labels)\n",
        "test_images = np.expand_dims(test_images, axis=-1)/255.\n",
        "test_labels = np.int64(test_labels)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "5ZtUqOqePsRD",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Our training set is made up of 28x28 grayscale images of handwritten digits. \n",
        "\n",
        "Let's visualize what some of these images and their corresponding training labels look like."
      ]
    },
    {
      "metadata": {
        "scrolled": true,
        "id": "bDBsR2lP_N8O",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "plt.figure(figsize=(10,10))\n",
        "random_inds = np.random.choice(60000,36)\n",
        "for i in range(36):\n",
        "    plt.subplot(6,6,i+1)\n",
        "    plt.xticks([])\n",
        "    plt.yticks([])\n",
        "    plt.grid(False)\n",
        "    image_ind = random_inds[i]\n",
        "    plt.imshow(np.squeeze(train_images[image_ind]), cmap=plt.cm.binary)\n",
        "    plt.xlabel(train_labels[image_ind])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "V6hd3Nt1_N8q",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## 1.2 Neural Network for Handwritten Digit Classification\n",
        "\n",
        "We'll first build a simple neural network consisting of two fully connected layers and apply this to the digit classification task. Our network will ultimately output a probability distribution over the 10 digit classes (0-9). This first architecture we will be building is depicted below:\n",
        "\n",
        "![alt_text](img/mnist_2layers_arch.png \"CNN Architecture for MNIST Classification\")\n"
      ]
    },
    {
      "metadata": {
        "id": "rphS2rMIymyZ",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Fully connected neural network architecture\n",
        "To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model. \n",
        "\n",
        "In this next block, you'll define the output layer -- the second fully connected of this simple network."
      ]
    },
    {
      "metadata": {
        "id": "MMZsbjAkDKpU",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "def build_fc_model():\n",
        "  fc_model = tf.keras.Sequential([\n",
        "      # First define a Flatten layer\n",
        "      tf.keras.layers.Flatten(),\n",
        "      # '''TODO: Define the activation function for the first fully connected layer.'''\n",
        "      tf.keras.layers.Dense(128, '''TODO'''), # TODO \n",
        "      # '''TODO: Define the second Dense layer to output the classification probabilities'''\n",
        "      tf.keras.layers.Dense('''TODO''', '''TODO''') # TODO \n",
        "  ])\n",
        "  return fc_model\n",
        "\n",
        "model = build_fc_model()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "VtGZpHVKz5Jt",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "As we progress through this next portion, you may find that you'll want to make changes to the architecture defined above. **Note that in order to update the model later on, you'll need to re-run the above cell to re-initialize the model. **"
      ]
    },
    {
      "metadata": {
        "id": "mVN1_AeG_N9N",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Let's take a step back and think about the network we've just created. The first layer in this network, `tf.keras.layers.Flatten`, transforms the format of the images from a 2d-array (28 x 28 pixels), to a 1d-array of 28 * 28 = 784 pixels. You can think of this layer as unstacking rows of pixels in the image and lining them up. There are no learned parameters in this layer; it only reformats the data.\n",
        "\n",
        "After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are fully-connected neural layers. The first `Dense` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.\n",
        "\n",
        "That defines our fully connected model! "
      ]
    },
    {
      "metadata": {
        "id": "gut8A_7rCaW6",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "\n",
        "\n",
        "### Compile the model\n",
        "\n",
        "Before training the model, we need to define a few more settings. These are added during the model's [`compile`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#compile) step:\n",
        "\n",
        "* *Loss function* — This defines how we measure how accurate the model is during training. As was covered in lecture, during training we want to minimize this function, which will \"steer\" the model in the right direction.\n",
        "* *Optimizer* — This defines how the model is updated based on the data it sees and its loss function.\n",
        "* *Metrics* — Here we can define metrics used to monitor the training and testing steps. In this example, we'll look at the *accuracy*, the fraction of the images that are correctly classified.\n",
        "\n",
        "We'll start out by using a stochastic gradient descent (SGD) optimizer initialized with a learning rate of 0.1. Since we are performing a categorical classification task, we'll want to use the [cross entropy loss](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/sparse_categorical_crossentropy).\n",
        "\n",
        "You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model. "
      ]
    },
    {
      "metadata": {
        "id": "Lhan11blCaW7",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "'''TODO: Experiment with different optimizers and learning rates. How do these affect\n",
        "    the accuracy of the trained model? Which optimizers and/or learning rates yield\n",
        "    the best performance?'''\n",
        "model.compile(optimizer=tf.train.GradientDescentOptimizer(learning_rate=1e-1), \n",
        "              loss='sparse_categorical_crossentropy',\n",
        "              metrics=['accuracy'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "qKF6uW-BCaW-",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Train the model\n",
        "\n",
        "We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training. With the Keras API and defining the model settings in the `compile` step, training is all accomplished by calling the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) method on an instance of the Model class. \n"
      ]
    },
    {
      "metadata": {
        "id": "EFMbIqIvQ2X0",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# Define the batch size and the number of epochs to use during training\n",
        "BATCH_SIZE = 64\n",
        "EPOCHS = 5"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "xvwvpA64CaW_",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "model.fit(train_images, train_labels, batch_size=BATCH_SIZE, epochs=EPOCHS)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "W3ZVOhugCaXA",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "As the model trains, the loss and accuracy metrics are displayed. With five epochs and a learning rate of 0.01, this fully connected model should achieve an accuracy of approximatley 0.97 (or 97%) on the training data."
      ]
    },
    {
      "metadata": {
        "id": "oEw4bZgGCaXB",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Evaluate accuracy on the test dataset\n",
        "\n",
        "Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array. \n",
        "\n",
        "Use the [`evaluate`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#evaluate) method to evaluate the model on the test dataset!"
      ]
    },
    {
      "metadata": {
        "id": "VflXLEeECaXC",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "'''TODO: Use the evaluate method to test the model!'''\n",
        "test_loss, test_acc = # TODO\n",
        "\n",
        "print('Test accuracy:', test_acc)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "yWfgsmVXCaXG",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data. \n",
        "\n",
        "What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...\n",
        "\n",
        "![Deeper...](https://i.kym-cdn.com/photos/images/newsfeed/000/534/153/f87.jpg)"
      ]
    },
    {
      "metadata": {
        "id": "baIw9bDf8v6Z",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## 1.3 Convolutional Neural Network (CNN) for handwritten digit classification"
      ]
    },
    {
      "metadata": {
        "id": "_J72Yt1o_fY7",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "As we saw in lecture, convolutional neural networks (CNNs) are particularly well-suited for a variety of tasks in computer vision, and have achieved near-perfect accuracies on the MNIST dataset. We will now build a CNN composed of two convolutional layers and pooling layers, followed by two fully connected layers, and ultimately output a probability distribution over the 10 digit classes (0-9). The CNN we will be building is depicted below:\n",
        "\n",
        "![alt_text](img/convnet_fig.png \"CNN Architecture for MNIST Classification\")"
      ]
    },
    {
      "metadata": {
        "id": "EEHqzbJJAEoR",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Define the CNN model\n",
        "\n",
        "We'll use the same training and test datasets as before, and proceed similarly as our fully connected network to define and train our new CNN model. \n",
        "\n",
        "You can use  [`keras.layers.Conv2D` ](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) to define convolutional layers and [`keras.layers.MaxPool2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) to define the pooling layers. Use the parameters shown in the network architecture above to define these layers and build the CNN model!"
      ]
    },
    {
      "metadata": {
        "id": "vec9qcJs-9W5",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "def build_cnn_model():\n",
        "    cnn_model = tf.keras.Sequential([\n",
        "\n",
        "        tf.keras.layers.Conv2D(filters=24, kernel_size=(3,3), input_shape=(28, 28, 1), activation=tf.nn.relu), # TODO        \n",
        "\n",
        "\n",
        "        tf.keras.layers.MaxPool2D(pool_size=(2,2)), \n",
        "\n",
        "        #'''TODO: Define the second convolutional layer'''\n",
        "        tf.keras.layers.Conv2D('''TODO'''), # TODO\n",
        "\n",
        "        #'''TODO: Define the second max pooling layer'''\n",
        "        tf.keras.layers.MaxPool2D('''TODO'''), # TODO\n",
        "\n",
        "        tf.keras.layers.Flatten(),\n",
        "        tf.keras.layers.Dense(128, activation=tf.nn.relu),\n",
        "        #'''TODO: Define the last Dense layer'''\n",
        "        tf.keras.layers.Dense('''TODO''') # TODO\n",
        "    ])\n",
        "    return cnn_model\n",
        "  \n",
        "cnn_model = build_cnn_model()\n",
        "print(cnn_model.summary())"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "kUAXIBynCih2",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Train and test the CNN model\n",
        "\n",
        "Now, as before, we can define the loss function, optimizer, and metrics through the `compile` method. Compile the CNN model with an optimizer and learning rate of choice:"
      ]
    },
    {
      "metadata": {
        "id": "vheyanDkCg6a",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "'''TODO: Define the compile operation with your optimizer and learning rate of choice.\n",
        "We recommend beginning with the same optimizer and learning rate as the feed forward model'''\n",
        "cnn_model.compile(optimizer='''TODO''', # TODO\n",
        "              loss='sparse_categorical_crossentropy',\n",
        "              metrics=['accuracy'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "U19bpRddC7H_",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Now we can train our model using the `fit` method via the Keras API:"
      ]
    },
    {
      "metadata": {
        "id": "YdrGZVmWDK4p",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "'''TODO: Train the CNN model'''\n",
        "cnn_model.fit('''TODO''') # TODO"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "pEszYWzgDeIc",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Great! Now that we've trained the model, let's evaluate it on the test dataset using the [`evaluate`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#evaluate) method:"
      ]
    },
    {
      "metadata": {
        "id": "JDm4znZcDtNl",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "'''TODO: Use the evaluate method to test the model!'''\n",
        "test_loss, test_acc = cnn_model.evaluate('''TODO''') # TODO\n",
        "\n",
        "print('Test accuracy:', test_acc)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "2rvEgK82Glv9",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model? "
      ]
    },
    {
      "metadata": {
        "id": "xsoS7CPDCaXH",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "### Make predictions with the CNN model\n",
        "\n",
        "With the model trained, we can use it to make predictions about some images. We'll use the [`predict`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#predict) function call to generate the output predictions given a set of input samples.\n"
      ]
    },
    {
      "metadata": {
        "id": "Gl91RPhdCaXI",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "predictions = cnn_model.predict(test_images)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "x9Kk1voUCaXJ",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "With this function call, the model has predicted the label for each image in the testing set. Let's take a look at the prediction for the first image in the test dataset:"
      ]
    },
    {
      "metadata": {
        "id": "3DmJEUinCaXK",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "predictions[0]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "-hw1hgeSCaXN",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits. \n",
        "\n",
        "Let's look at the digit that has the highest confidence for the first image in the test dataset:"
      ]
    },
    {
      "metadata": {
        "id": "qsqenuPnCaXO",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "'''TODO: identify the digit with the highest confidence prediction for the first\n",
        "    image in the test dataset'''\n",
        "digit = # TODO"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "E51yS7iCCaXO",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "So, the model is most confident that this image is a \"7\". We can check the test label (remember, this is the true identity of the digit) to see if this prediction is correct:"
      ]
    },
    {
      "metadata": {
        "id": "Sd7Pgsu6CaXP",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "test_labels[0]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "ygh2yYC972ne",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "It is!\n",
        "\n",
        "We can define a couple of functions to help visualize the classification results on the MNIST dataset. First, we'll write a function, `plot_image`, to plot images along with their predicted label and the probability of the prediction. Second, we'll also define a function, `plot_value_array`, to plot the prediction probabilities for each of the digits. "
      ]
    },
    {
      "metadata": {
        "id": "DvYmmrpIy6Y1",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "def plot_image(i, predictions_array, true_label, img):\n",
        "  predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]\n",
        "  plt.grid(False)\n",
        "  plt.xticks([])\n",
        "  plt.yticks([])\n",
        "  \n",
        "  plt.imshow(np.squeeze(img), cmap=plt.cm.binary)\n",
        "\n",
        "  predicted_label = np.argmax(predictions_array)\n",
        "  if predicted_label == true_label:\n",
        "    color = 'blue'\n",
        "  else:\n",
        "    color = 'red'\n",
        "  \n",
        "  plt.xlabel(\"{} {:2.0f}% ({})\".format(predicted_label,\n",
        "                                100*np.max(predictions_array),\n",
        "                                true_label),\n",
        "                                color=color)\n",
        "\n",
        "def plot_value_array(i, predictions_array, true_label):\n",
        "  predictions_array, true_label = predictions_array[i], true_label[i]\n",
        "  plt.grid(False)\n",
        "  plt.xticks([])\n",
        "  plt.yticks([])\n",
        "  thisplot = plt.bar(range(10), predictions_array, color=\"#777777\")\n",
        "  plt.ylim([0, 1]) \n",
        "  predicted_label = np.argmax(predictions_array)\n",
        " \n",
        "  thisplot[predicted_label].set_color('red')\n",
        "  thisplot[true_label].set_color('blue')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "d4Ov9OFDMmOD",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Let's use these functions to visualize the model's predictions for the images in the test dataset: "
      ]
    },
    {
      "metadata": {
        "id": "HV5jw-5HwSmO",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "#@title Change the slider to look at the model's predictions! { run: \"auto\" }\n",
        "\n",
        "image_index = 8 #@param {type:\"slider\", min:0, max:100, step:1}\n",
        "plt.subplot(1,2,1)\n",
        "plot_image(image_index, predictions, test_labels, test_images)\n",
        "plt.subplot(1,2,2)\n",
        "plot_value_array(image_index, predictions,  test_labels)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "kgdvGD52CaXR",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "We can also plot several images along with their predictions, where correct prediction labels are blue and incorrect prediction labels are red. The number gives the percent confidence (out of 100) for the predicted label. Note the model can be very confident in an incorrect prediction!"
      ]
    },
    {
      "metadata": {
        "id": "hQlnbqaw2Qu_",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# Plot the first X test images, their predicted label, and the true label\n",
        "# Color correct predictions in blue, incorrect predictions in red\n",
        "num_rows = 5\n",
        "num_cols = 4\n",
        "num_images = num_rows*num_cols\n",
        "plt.figure(figsize=(2*2*num_cols, 2*num_rows))\n",
        "for i in range(num_images):\n",
        "  plt.subplot(num_rows, 2*num_cols, 2*i+1)\n",
        "  plot_image(i, predictions, test_labels, test_images)\n",
        "  plt.subplot(num_rows, 2*num_cols, 2*i+2)\n",
        "  plot_value_array(i, predictions, test_labels)\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "k-2glsRiMdqa",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## 1.4 Training the model 2.0\n",
        "\n",
        "Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts. \n",
        "\n",
        "As an alternative to this, we can use the [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) class to record differentiation operations during training, and then call the [`tf.GradientTape.gradient`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient) function to actually compute the gradients. \n",
        "\n",
        "You may recall seeing this in Lab 1 Part 1, but let's take another look at this here.\n",
        "\n",
        "We'll use this framework to train our `cnn_model` using stochastic gradient descent."
      ]
    },
    {
      "metadata": {
        "id": "Wq34id-iN1Ml",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "# Rebuild the CNN model\n",
        "cnn_model = build_cnn_model()\n",
        "\n",
        "batch_size = 12\n",
        "loss_history = util.LossHistory(smoothing_factor=0.99) # to record the evolution of the loss\n",
        "plotter = util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss', scale='semilogy')\n",
        "optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-2) # define our optimizer\n",
        "\n",
        "bar = util.create_progress_bar()\n",
        "for idx in bar(range(0, train_images.shape[0],batch_size)):\n",
        "  # First grab a batch of training data and convert the input images to tensors\n",
        "  (images, labels) = (train_images[idx:idx+batch_size], train_labels[idx:idx+batch_size])\n",
        "  images = tf.convert_to_tensor(images, dtype=tf.float32)\n",
        "\n",
        "  # GradientTape to record differentiation operations\n",
        "  with tf.GradientTape() as tape:\n",
        "    logits = cnn_model(images) # feed the images into the model\n",
        "    loss_value = tf.keras.backend.sparse_categorical_crossentropy(labels, logits) # value of the loss\n",
        "\n",
        "  loss_history.append(loss_value.numpy().mean()) # append the loss to the loss_history record\n",
        "  plotter.plot(loss_history.get())\n",
        "  # Backpropagation\n",
        "  grads = tape.gradient(loss_value, cnn_model.variables)\n",
        "  optimizer.apply_gradients(zip(grads, cnn_model.variables),\n",
        "                            global_step=tf.train.get_or_create_global_step())\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "3cNtDhVaqEdR",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## 1.5 Conclusion\n",
        "In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias. "
      ]
    }
  ]
}